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Least squares credibility is usually derived from some fairly complicated looking assumptions 
about risk across a collective. It turns out, however, that the basic results can be developed from 
some standard statistical operations with weighted regression. This is outlined, and some more 


advanced models are tied to the same approach, in this note. 
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CREDIBILITY THEORY FOR DumMIES 


Credibility theory is usually presented as a mathematically dense body of formulas. Here is some- 
thing a little different: a short, simple approach. “Dummies” is of course a relative term. Algebra, 
differential calculus, and some background in statistics are all assumed. 


What is credibility? 
Credibility theory is all about weighted averages. Different estimates of a quantity are to be 


weighted together. The more credible estimates get more weight. 


In the context of estimating expected losses for a member of a class, there are two natural esti- 
mates: the experience of the member itself, and the average of the entire class. The former is 
more relevant but also more volatile than the latter. Two general approaches have been taken to 
calculating weights in this case. The limited fluctuation approach is willing to accept the member 
experience at face value if it meets a pre-defined standard of stability (full credibility) and if not 
reduces the weight enough for the weighted average to meet the stability requirement. The 
greatest accuracy approach measures relevance as well as stability and looks for the weights that 
will minimize an error measure. The average of the entire class could be a very stable quantity, 
but if the members of the class tend to be quite different from each other, it could be of less 
relevance for any particular class. So the relevance of a wider class average to any member’s 


mean is inversely related to the variability among the members of the class. 


The error measure used in thé greatest accuracy approach is almost always expected squared er- 
ror, so this method is often called “least squares credibility.” In Europe it is sometimes called 
“classical credibility.” The limited fluctuation approach is called classical in North America. Thus 
“classical” is a term worth avoiding, not only because of its geographic ambiguity, but also be- 


cause it is a historical rather than a methodological description. 
Least squares credibility 


Suppose you have two independent estimates x and y of a quantity, with respective expected 


squared errors u and v. Take a weighted average a = zx + (1~z)y. The expected squared error of 
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ais w= zu + (1-z)’v. What z minimizes w? Here is where the calculus comes in. The deriva- 
tive dw/dz is 2zu + 2(z~1)v. If you set that to zero you get: zu+zv = v, or z = v/(u+v). Then 
1-2 is u/(utv). This makes it look like each estimate gets a weight proportional to the expected 
squared error of the other. To express the weights as properties of the estimates themselves, 
note that (1/u)/[1/u + 1/v] = 1/[1+u/v] = v/(utv) = z. This shows that each estimate gets a 
weight proportional to the reciprocal of its expected squared error '. Least squares credibility is 


an application of this principle. 


As an example, consider a class of risks. Suppose the losses L, in year j for the ith member of the 
class are randomly distributed as follows: 

L,=C+M,+6& (1) 

where C is the class mean loss, C+ M, is the mean loss for the ith member, and €; is the random 


component for the jth period for this member. It is not much of a restriction to assume that the 
M's average to zero as do the €,‘s. Suppose the variance of the M/s is and the variance of the 


random components &; all are s;. Denote their average E(s,) by s”. 


Sometimes ¢’ is called the variance of the hypothetical means and s* the expected process vari- 


ance. “Hypothetical” refers to the fact that the means C + M, are not observed. 


With this setup, consider two estimates of member i mean losses: x, the average losses of the 
member for n periods, and y, the class mean loss C, which for now we will assume to know or at 
least be able to estimate well enough to ignore the error. To apply the inverse variance weight- 
ings, we needed to know the expected squared errors of x and y from the true value of C + M. 
By the definitions, y’s expected squared error is just t’. The expected squared error of x is the 
expected value of its variance s,/n, ie., s*/n. Then applying the inverse expected squared error 
principle gives a weight to x of z = (n/s))/[n/ s’ + 1/1)] = n/{n + s°/t)]. This is the original 
Buhimann credibility formula. 





1 This assumes the expected squared error is minimized rather than maximized at this z. The second derivative of w 


is 2u + 2v which is positive, so this assumption is valid. 


624 


The above would be an appropriate set of assumptions for a class where all members had 
roughly the same exposure, such as single cars. If the exposure vaties much across members, like 
in territory ratemaking or commercial experience rating, the variances of the random compo- 
nents could not reasonably be assumed to be constant over time. To address this case, introduce . 
an exposure measure P, for the ith member in period j, and assume that the variance of its ran- 
dom loss component is Ps), so each unit of exposure has a vasiance of s?. In this case it would 
not be right to assume that M; has mean zero, in that different members of the class would de- 
part from the class mean loss in differing amounts depending on exposure. However, if in equa- 
tion (1) L is reinterpreted as losses per unit of exposure, i.e., pure premium, this assumption 
could be reasonable. In that case, the variance of &; would be s7/ P,. So here, x is the average loss 


per exposure for the ith member for n periods, and y is the mean pure premium for the class. 


Thus the expected squared error of y from C + M, would still be t. Assume further that x is cal- 
culated as the sum of the n period losses divided by the sum of the exposures. Use a “~” in a 
subscript to denote summation, so the total exposures for the ith class over the n periods is P,_. 
Then the variance of x is just P,.s7/P,.? = s2/P,., with expected value s’/P,_. So what is the 
ctedibility of the pure premium? The inverse expected squared error weighting gives z, = 
@,./s))/[ P./s” + t-7] = P,_/[P,. + s’/t’]. This is often expressed more simply as z = P/[P+K], 
which is the Bihlmann-Straub credibility formula. 


C can be estimated by a weighted average of the x’s, the member means. The expected squared 
error of x from C is t? + s’/P,., so x should get a weight inversely proportional to that, so pro- 
portional to t?P,_/[ P, + s’/t’], which is proportional to z, . Thus C can be estimated as a 
weighted average of the x’s where the weights for each member are proportional to the mem- 


ber’s credibility. 


What has been lost by the simplified approach? First, instead of (1), L, is often considered to be 
a conditional process with a parameter, say q, .and a conditional mean and variance given the 
parameter. The conditional means are assumed to average to the class mean C with a variance ¢ 
and the conditional variances average to s’. Then defining M, as the conditional mean less C is 


equivalent to the additive formulation (1). However the full usual derivation gets an additional 
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result: a weighted average of member means is the best linear combination of any sort of the 
individual member observations by time period. However, this is a fairly general statement itself, 


and it might be true in general separate from the credibility formulation. 


Both this formulation and the usual credibility derivation ignore the estimation error for C in the 
credibility formula. Empirical Bayes theory addresses this issue, which does make a difference in 
small samples. It might be possible to get the empirical Bayes results from the inverse squared 


error principle as well. 


Beyond Bihimann-Straub: Large vs. small risk differences 

The assumption that each unit of exposure generates the same amount of loss variance is some- 
times described as assuming that a large risk behaves like an independent combination of small 
risks. Hewitt in his 1967 paper presented some data showing this was not the case”. Actually 
large risks have more variance than would be expected from treating them as independent com- 
binations of smaller risks. One thing that contributes to this is that risk conditions change over 
time. Size of exposure does not provide much stability against changing economic and business 
sector changes. A way to model this would be to assume that the variance of the observed loss 
for each risk for each period has the usual component that increases with risk size plus another 
component that increases with risk size squared, i.e., assume that the loss variance is PZ? + P,s*. 


Then the variance of the pure premium would be wt s?/ P,. 


The credibility formula now gets more complicated, but is not too bad in the special case where 
there is just one time period. With the inverse expected squared error formula, z = 
[P,./(P,.w’+s]/[ P,./(P,w+s+ 77] = P_/[P,. + Pw’/t? + s*/1’]. This could be written as z 
= P/(P + AP + K). For larger values of P this makes the denominator larger, so decreases the 
credibility compared to P/[P+K]. 


In this case risk stability is a more complicated function of exposure than in the original model. 


In experience rating workers compensation another phenomenon has sometimes been observed: 


2 Loss Ratio Distributions —.A Model, PCAS LIV. 
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the large risks’ mean loss exposures are less different from the overall mean than are the small 
risks’. This could be a matter of regulation, where large risks must follow more safety precau- 
tions, but other reasons are possible. Whatever causes this phenomenon, the result is that the 
variance of M, (..e., the variance among risk means) also becomes a function of exposure. Since 
it is the smaller risks that have more potential for large departures from the overall average pure 
premium, this average becomes less relevant for the small risks, which increases the credibility of 
their own experience. A reasonable formula for the variance among tisk means in this situation 
might be C-+v?/P,_ in the single time period case for member i. Suppressing the subscripts on P, 
z becomes z = [P/( Pu’+s?)]/[ P/(Pu’+s’)+ P/(Pt?+v2)] = (Pt?+v2)/[Pu+s’ + Pt?+v2]. This 
can be simplified to z = (P + B)/(P + AP + K + B). The extra B in the numerator and denomi- 


nator increases z, especially for smaller risks where P is smaller, which is what was anticipated. 


When linear estimates don’t work 

So far this discussion has been non-parametric. That is, the forms of the distributions have not 
entered in. That is the advantage of linear estimates with squared error penalties. If you have 
some information about the type of distribution available, you can give up the restriction to lin- 
ear functions. In a Bayesian framework the class experience becomes the prior distribution for 
the member experience, and then the Bayesian conditional expected value of the member mean 
given the data is the least squares estimator of the member mean of any sort, linear or not. In 
some cases the conditional mean is a linear function of the data (e.g., normal and gamma distri- 
butions) so the linear restriction of credibility theory does not reduce the accuracy. However in 
highly skewed distributions, like some lognormal cases, the Bayes estimate is highly non-linear, 


and credibility weighting can give large errors for classes with small means. 


If the distribution type is fairly well understood, Bayesian methods would be preferable in such 
cases. However, an alternative when the member means can be very different from each other is 
to do the usual credibility estimation in the logs of the data, then exponentiate the results. This 
introduces a downward bias, however, which has to be adjusted multiplicatively to balance to the 


overall data. 
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